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A  number  of  procedures  have  been  used  to  analyze  nonmonotonic  binary 
data  to  predict  the  probability  of  response.  Some  classical  procedures  are 
the  Up  and  Down  strategy,  the  Robbins-Monro  procedure,  and  other  se¬ 
quential  optimization  designs.  Recently,  nonparametric  procedures  such  as 
kernel  regression  and  local  linear  regression  have  been  applied  to  this  type 
of  data.  It  is  well  known  that  kernel  regression  has  problems  fitting  the  data 
near  the  boundaries,  and  a  drawback  with  local  linear  regression  is  that  it 
may  be  too  linear  when  fitting  data  from  a  curvilinear  function.  This  report 
introduces  a  procedure  called  local  logistic  regression,  which  fits  a  logistic 
regression  function  at  each  of  the  data  points.  United  States  Army  projectile 
data  are  used  in  an  example  that  supports  the  use  of  local  logistic  regres¬ 
sion  for  analyzing  nonmonotonic  binary  data  for  certain  response  curves. 
Properties  of  local  logistic  regression  are  presented  along  with  simulation 
results  that  indicate  some  of  the  strengths  of  the  procedure. 
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1.  Introduction 


Consider  modeling  the  dose-response  curve  for  the  ungrouped  binary  re¬ 
sponse  variable  y.  This  dose-response  curve  represents  the  probability  that 
y  is  1  for  each  given  value  of  “  dose,"  the  single  regressor  v.  In  many  ap¬ 
plications  in  such  areas  as  biology,  industry,  and  business,  this  response 
curve  may  be  modeled  successfully  by  a  monotonic  parametric  function 
such  as  the  normal  or  logistic  cumulative  distribution  function  (cdf).  How¬ 
ever,  in  this  report,  we  present  an  application  where  the  response  curve  is 
nonmonotonic,  rendering  traditional  methods  such  as  logistic  regression 
inappropriate.  We  suggest  the  use  of  local  logistic  regression  (llogr),  a  non- 
parametric  method  that  simultaneously  enables  modeling  a  nonmonotonic 
dose-response  curve  while  maintaining  the  restriction  that  all  estimated 
responses  take  values  between  0  and  1.  The  following  real-life  example 
demonstrates  the  flexibility  and  applicability  of  this  new  procedure. 

An  experiment  of  importance  to  the  development  of  a  kinetic  energy  pen- 
etrator  (a  “  projectile” )  and  the  armor  to  resist  it  is  to  model  the  probability 
that  the  projectile  will  penetrate  the  plate  of  armor  as  a  function  of  the  ve¬ 
locity  of  the  projectile.  Normally,  one  expects  the  probability  of  penetration 
to  increase  as  velocity  increases.  Routinely,  testers  are  asked  to  estimate  the 
V50,  the  velocity  where  the  probability  is  0.50  that  the  projectile  will  pene¬ 
trate  the  armor.  Experimenters  use  variants  of  the  Up  and  Down  strategy 
(Dixon  and  Mood,  1948),  the  Robbins-Monro  (Robbins  and  Monro,  1951), 
or  other  sequential  designs  to  gather  observations,  and  the  maximum  like¬ 
lihood  estimate  (MLE)  for  the  mean  (F50)  is  formed  with  a  logistic  or  nor¬ 
mal  response  function.  An  interlaboratory  study  of  V50  estimation  revealed 
widely  varying  estimates  for  some  penetrator-armor  matches  (Chang  and 
Bodt,  1997).  Initially,  the  blame  was  thought  to  lie  with  varying  test  pro¬ 
cedures  among  experimental  testing  sites.  Recent  careful  study  of  the  data 
suggests  a  phenomenon  at  work  here  that  one  material  scientist  refers  to  as 
the  “  shatter  gap." 

The  shatter  gap  is  not  precisely  defined,  and  the  exact  physical  mechanism 
for  significant  bullet  shattering  is  unknown.  Simply  put,  when  bullet  shat¬ 
tering  occurs  against  certain  armor  materials,  kinetic  energy  is  diffused, 
thereby  reducing  the  ability  of  the  penetrator  to  defeat  the  armor.  This  re¬ 
duction  is  reflected  over  a  range  of  velocity  in  which  a  decrease  in  response 
probability  accompanies  increasing  velocity.  Eventually,  as  velocity  contin¬ 
ues  to  increase,  the  kinetic  energy  developed  is  so  much  an  overmatch  that 
the  probability  of  penetration  rises  again.  One  working  definition,  suitable 
for  data  currently  available,  characterizes  the  shatter  gap  in  terms  of  an  up¬ 
per  and  lower  kso.  and  V50Z,,  respectively.  In  those  data,  penetrator 
shatter  does  not  cause  the  probability  of  penetration  to  decrease  until  after 
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a  V50  is  achieved.  A  second,  higher,  Vsn  is  achieved  by  partially  intact  pene- 
trators  with  increased  velocity.  The  interval  they  bound  is  thought  of  as  the 
shatter  gap. 

These  ideas  are  illustrated  in  figure  1,  showing  the  results  of  firing  303 
projectiles  at  macrocomposite  armor  with  velocities  ranging  from  1432  to 
2773  ft/s.  The  response  variable  is  whether  the  projectile  penetrated  the  ar¬ 
mor  (scored  as  a  “  1”)  or  failed  to  penetrate  the  armor  (scored  as  a  “0”). 
Typical  of  penetrator-against-plate  response  curves,  low  velocities  result  in 
a  low  proportion  of  penetrations,  and  high  velocities  result  in  a  high  pro¬ 
portion  of  penetrations.  Unusual  is  that  over  intermediate  velocities,  the 
proportion  of  penetrations  first  rises  with  velocity,  then  falls,  and  then  rises 
once  more.  Note  especially  the  response  activity  between  2100  and  2300 
ft/s  relative  to  response  activity  for  neighboring  velocities. 

The  shatter  gap  phenomenon  is  more  clearly  seen  from  the  curve  in  figure 
1;  this  curve  (obtained  by  the  method  presented  in  sect.  2)  represents  the 
estimated  probability  of  penetration  as  a  function  of  velocity. 

We  can  represent  the  Uso  graphically  by  extending  a  line  at  the  probability 
of  0.5  over  to  the  estimated  probability  curve  and  dropping  lines  down  to 
the  velocity  axis  at  the  intersection  of  the  line  and  the  curve.  The  interval 
bounded  by  these  two  Vso’s  on  the  velocity  axis  is  the  shatter  gap.  Figure  1 
shows  that  the  shatter  gap  extends  from  Usor  of  approximately  1713  ft/s  to 
Vsof/  of  approximately  2510  ft/s,  a  distance  of  797  ft/s,  far  larger  than  the 
usual  error  of  approximately  1 00  ft/ s. 

In  section  2  we  discuss  several  analytical  methods  to  deal  with  data  of  this 
type. 


Figure  1.  Army 
macrocomposite  data 
(solid  dotsb  illustrating 
shatter  gap 

phenomenon.  Curve  is 
estimated  probability  of 
penetration. 


Velocity  (ft/s) 


2 


2.  Possible  Models 


An  obvious  approach  to  fitting  0  to  1  response  data  is  to  use  the  standard 
logistic  regression  model.  However,  as  will  be  demonstrated,  such  a  model 
will  fail  to  identify  the  shatter  gap  in  the  penetration  data  because  of  the  in¬ 
herent  monotonically  increasing  (or  decreasing)  nature  of  the  logistic  curve. 

Two  more  promising  possible  models  for  fitting  data  of  the  type  given  in 
the  curve  in  figure  1  are  the  cdf  mixture  model  (CDFMM)  (Bodt  and  Chang, 
1997),  a  parametric  method  model  procedure,  and  local  llogr,  a  nonpara- 
metric  method. 

2.1  Parametric  Models 

Consider  the  model 


yi  =  Pivi)  +  Si,  i  = 


where  yi  represents  the  response  variable,  P{vi)  is  the  penetration  proba¬ 
bility  that  is  a  function  of  the  velocity  Vi  at  the  ith  experimental  run,  and  e, 
is  the  random  error  term.  The  response  is  recorded  as  a  “  1”  if  armor  pene¬ 
tration  has  occurred  and  a  “  0”  otherwise.  The  errors  are  assumed  to  have 
expectation  of  zero. 

In  the  logistic  regression  model,  the  form  of  P  (vi)  is  based  on  the  logistic 
cumulative  distribution  function,  where 


P{vi)  =  (l  +  exp  -  {Po  -t-  piVi)  )  =  (l  +  exp  (-v-^))  ^  =  F  (v-^) , 


where  v-  =  (1  vp  and  P  = 


Po 


and  a  prime  indicates  a  transpose. 


,  where  underlining  indicates  a  vector 


The  method  of  maximum  likelihood  is  used  to  obtain  estimates  (MLEs), 
of  the  unknown  parameters  Po  and  pi.  To  obtain  the  MLEs,  the  method 
requires  an  iterative  procedure,  for  example,  the  Gauss-Newton  method 
using  iterated  reweighted  least  squares  (IRLSs).  Details  concerning  logistic 
regression  may  be  found  in  McCullagh  and  Nelder  (1983),  Myers  (1990) 
and  Ryan  (1997),  among  many  others. 

While  the  logistic  regression  fit  guarantees  an  estimated  response  between 
0  and  1,  the  fit,  as  seen  in  figure  2  (dashed  line),  is  entirely  inadequate  to 
the  armor  penetration  data-  resulting  in  a  nearly  flat  line  fit,  completely 
missing  the  up  arid  down  nature  of  the  response  seen  in  the  llogr  fit  (solid 
line). 
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Figure  2.  Linear  logistic 
regression  fit  (dashed 
line)  and  local  linear 
regression  fit  (bold  solid 
line)  to  macrocomposite 
data. 


Velocity  (ft/s) 


The  CDFMM  approach,  using  mixtures  of  three  ccifs,  models  the  penetra¬ 
tion  probability  P{v)  as  a  function  of  velocity  r  as 

P{v)  =  [1  -  Friv:  fii- cr/)  +  Ft{v:  //t-  aT)Fs{v;  //■<?■  cr^) 

where  Ft- Fj,  and  Fs  represent  appropriately  chosen  cdfs  for  the  transition, 
intact,  and  shattered  mechanisms. 

One  interpretation  of  the  above  model  is  to  consider  P{v)  as  the  probabil¬ 
ity  of  response  due  to  the  mixture  of  two  penetration  mechanisms.  The 
first  mechanism  is  penetration  from  an  intact  (7)  projectile.  The  second 
mechanism  is  penetration  from  a  shattered  (5)  projectile.  The  penetrator 
being  intact  or  shattered  is  also  a  function  of  velocity  in  this  formulation, 
and  so  the  proportions  of  the  two  penetration  mechanisms  in  the  mixture 
will  vary  also  with  velocity.  It  is  assumed  in  the  model  that  the  probability 
of  projectile  shatter  will  increase  monotonically.  This  proportion  change  is 
modeled  as  the  transition  (T)  from  the  intact  mechanism  to  the  shattered 
mechanism. 

The  CDFMM  model  fit  to  the  Army  penetration  data  is  illustrated  in  figure 
3,  which  also  shows  for  comparison  the  llogr  fit. 

The  dashed  line  represents  the  CDFMM  fit  using  logistic  cdfs  with  esti¬ 
mated  means  of  1972,  1664,  and  2523  and  estimated  standard  deviations 
of  186,  56,  and  46  for  the  transition,  intact,  and  shattered  mechanisms,  re¬ 
spectively.  The  curves  are  obviously  quite  similar,  with  the  CDFMM  fit 
resulting  in  slightly  larger  (at  around  1800  ft/s)  and  slightly  smaller  (at 
around  2400  ft/s)  penetration  probabilities.  The  shatter  gap  obtained  with 
the  CDFMM  fit  is  only  slightly  wider  than  that  obtained  with  the  llogr  fit. 
The  estimates  are  the  MLEs  for  this  nonlinear  model  with  binary  response, 
an  example  of  a  generalized  nonlinear  model. 

One  problem  with  the  CDFMM  approach  is  that  six  parameter  estimates 
are  required  to  fit  the  model.  This  requires  considerable  effort  to  achieve 
convergence.  We  used  the  SAS  Proc  NLIN  (SAS  Version  7)  with  an  appro¬ 
priately  iterated  weight  matrix  to  obtain  the  fit  for  the  Army  penetration 
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Figure  3.  Comparison  of  1 
fits  between  llogrs  (solid  q  g 
line)  and  cdf  mixture 
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model  (dashed  line), 
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data.  Our  experience  suggests  that  this  procedure  is  extremely  sensitive  to 
the  starting  values,  often  failing  to  converge  if  the  starting  values  are  only 
slightly  different  from  the  values  at  convergence.  Additionally,  the  result¬ 
ing  fit,  as  can  be  seen  in  figure  3,  is  somewhat  inflexible,  resulting  in  fits  that 
look  like  a  mixture  of  logistic  cdfs,  even  when  the  data  do  not  exhibit  such 
a  pattern.  That  is,  the  curve  may  appear  to  be  too  smooth,  as  illustrated  in 
figure  3,  where  the  llogr  method  seems  to  capture  the  wiggle  resulting  from 
the  cluster  of  data  around  2250  ft/s. 

2.2  Nonparametric  Models 

Another  approach  to  modeling  these  data  is  to  use  nonparametric  regres¬ 
sion  techniques.  Such  methods  as  kernel  regression  and  local  linear  regres¬ 
sion  can  trace  out  smooth  curves  with  a  multitude  of  peaks  and  valleys.  The 
shatter  gap  effect  that  must  be  captured  in  the  armor  penetration  example 
could  then  be  identified  by  such  “  local”  fi  tting  techniques. 

Consider  again  the  model 

Hi  =  P  {vi)  +  Ei,  i=l,...,n, 

where  P{vi)  is  no  longer  specified  by  a  parametric  function  such  as  the 
CDFMM  function  given  in  section  2.1.  It  is  assumed  only  that  P{vi)  is  some 
arbitrary  smooth  function.  If  E{ei)  =  0,  then  E(yi)  =  P{vi)  =  Pi,  the  ar¬ 
mor  penetration  probability  at  velocity  Vi.  It  follows  that  yi  =  P{vi)  =  Pi 
is  the  estimated  penetration  probability  at  velocity  Vi.  Kernel  regression  is 
a  method  of  approximating  P{vo)  (where  vq  is  any  arbitrary  velocity  in¬ 
cluding  Vi,  the  velocity  for  the  ith  observation)  using  a  weighting  sequence 
on  the  response  variable,  where  the  weights  are  functions  of  the  distances 
between  the  values  of  the  regressor  variable  and  vq.  One  form  of  the  ker¬ 
nel  weighting  sequence,  proposed  by  Nadaraya  (1964)  and  Watson  (1964), 
assigns  weights  of  to  yj  when  estimating  the  response  at  vq  by 

,  K  (>-■  (t.0  -  Vj)) 

n  ’ 

E  K  (6-1  (vo  -  Vj)) 
i=l 
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where  K{)  represents  some  appropriate  kernel  function  and  b  is  bandwidth. 
The  kernel  estimate  of  response  at  Vj,  Pj',  is  given  by  a  weighted  average  of 
the  observed  jjj’s  as 

fit  =  E  ''hw  ■ 

where  {vi  replaces  vq  in  the  Jiqj  expression  above)  are  the  kernel  weights 
on  the  observations  ijj  for  estimating  the  mean  response  at  velocity  v,,  using 
the  kernel  function  A' (),j  =  1.2. 

All  n  estimates  of  mean  response  may  be  expressed  as  y*’  =  where 
is  the  n  x  1  vector  of  kernel  predictions  and  is  the  n  x  ii  matrix  of 
weights  with  rows  h^'  =  (/i|j  /i  -2  •  •  •  for  '  =  I—*  r'.  The  matrix 

H^,  called  the  kernel  hat  matrix,  plays  a  role  in  kernel  regression  similar  to 
that  of  the  ordinary  least  square  (OLS)  “  hat”  matrix  H  in  linear  regression. 

The  kernel  estimate  of  P(i’i)  depends  on  the  choice  of  the  kernel  func¬ 
tion  K{)  and  the  bandwidth  h.  For  example,  one  popular  choice  for  kernel 
function  is  the  simple  Gaussian  kernel  defined  as  K{u)  —  exp  The 

method  for  selecting  the  bandwidth  is  extremely  critical.  The  magnitude  of 
the  bandwidth  determines  the  smoothness,  or  lack  thereof,  of  the  regres¬ 
sion  function.  One  of  the  more  commonly  used  methods,  referred  to  as  the 
method  of  cross-validation,  selects  the  bandwidth  to  minimize  the  PRESS 
(Allen,  1974)  statistic.  A  related  method  finds  b  by  minimizing  a  penalized 
PRESS  statistic,  called  PRESS*(&),  given  by 

E  (vi-fiLimY 

PRESS-(6)=‘°E_„|g>]  ■ 

where  (6)  is  the  “  minus-i”  predicted  penetration  probability  based  on 
kernel  regression  ly  for  the  current  value  of  b  with  the  ith  observation  re¬ 
moved,  and  tr  is  the  trace  of  the  n  x  n  kernel  weight  matrix  Since 
tr  [H^]  reflects  the  kernel  fits’  “  model  degrees  of  freedom”  (Cleveland, 
1978),  it  is  seen  that  the  denominator  of  PRESS*(6)  penalizes  the  PRESS 
statistic  for  choosing  b  too  small.  Empirical  studies  by  the  authors  and  oth¬ 
ers  (Einsporn  and  Birch,  1993;  Mays,  1995)  have  demonstrated  that  using 
PRESS*(6)  is  often  superior  to  using  PRESS  as  a  bandwidth  selector. 

The  kernel  regression  fit  to  the  armor  penetration  data  results  in  a  smooth 
curve  that  follows  the  up  and  down  pattern  exhibited  in  figures  1  and  3. 
While  this  is  a  desirable  characteristic,  the  resulting  curve  is  not  appealing, 
since  it  is  entirely  possible  that  some  of  the  curve’s  fitted  values  may  lie  out¬ 
side  the  0  to  1  range,  a  natural  restriction  resulting  from  estimating  the  pen¬ 
etration  probabilities.  Indeed,  the  kernel  regression  fit  to  the  Army  macro¬ 
composite  data  results  in  fits  at  the  lower  left  that  are  less  than  zero  and  fits 
to  the  upper  right  that  exceed  one.  This  problem  is  not  avoided  by  the  use  of 
more  complicated  nonparametric  smoothers,  such  as  local  linear  regression 
(Hardle,  1990),  where  each  fit  at  v  =  vo  is  obtained  locally  by  a  weighted 


simple  linear  regression  model  with  the  h^j’s  serving  as  the  weights,  or  lo¬ 
cal  polynomial  regression  (Fan  and  Gijbels,  1996).  These  methods  also  do 
not  restrict  the  resulting  fits  to  between  0  and  1 ,  a  requirement  when  model¬ 
ing  probabilities.  The  local  linear  regression  fit  to  the  macrocomposite  data 
can  be  seen  in  figure  2. 


However,  the  logistic  regression  parametric  method  can  be  combined  with 
the  nonparametric  concept  of  local  fitting  to  produce  a  smooth  curve,  flex¬ 
ible  enough  to  capture  the  up  and  down  patterns  exhibited  in  the  armor 
penetration  data  and  giving  fits  that  are  between  zero  and  one.  We  propose 
local  (linear)  logistic  regression  where  at  each  velocity  v  =  vq,  a  weighted 
linear  logistic  fit  is  obtained  with  Hq^’s  serving  at  the  weights,  in  exactly  the 
same  manner  as  in  local  linear  regression.  That  is,  the  fit  at  v  =  vo  would 
be  obtained  as 


P  (r;o)  =  +  exp  -  +  /5iovo)  )  =  (l  + exp  - 


-1 


The  estimated  coefficients  which  change  values  for  each  v  =  vq,  are 
obtained  via  the  same  IRLS  algorithm  referred  to  earlier  for  logistic  regres¬ 
sion,  with  a  slight  adjustment  to  the  weight  matrix.  Thus,  one  step  of  the 
algorithm  would  compute  the  updated  value  of  as 

where  the  n  x  n  diagonal  matrix  Wq  has  elements 

=  F  (pjPoo)  ^  for  i  =  1, ...,  n  . 

We  see  that  llogr  follows  a  weighting  scheme  that  combines  the  logistic 
weights  with  the  kernel  weights  in  a  local  manner.  The  n  x  1  vector  y*  has 
elements 

f 

the  locally  adjusted  response  to  velocity  Vj.  Upon  convergence  of  the  IRLS 
algorithm,  is  set  equal  to  and  the  estimated  response  is  obtained  as 

Approximate  inferential  information  can  be  obtained  by  straightforward 
application  of  the  variance  operator,  the  multivariate  delta  method,  and  the 
asymptotic  distribution  of  the  MLEs.  For  example,  it  can  be  shown  that  the 

asymptotic  variance  of^^^  is  var  =  {X'WqXY^  {X'WiX)  {X'WoXy\ 
where  the  diagonal  elements  of  the  n  x  n  diagonal  matrix  Wo  are  wqjj  = 

h^j,  and  the  diagonal  elements  of  the  n  x  n  diag¬ 
onal  matrix  Wi  are  wi-.  =  wo.jhj^j.  Using  this  result,  one  can  express  the 
asymptotic  variance  of  each  fit  at  u  =  vo  by  applying  the  delta  method  as 
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It  follows  then  from  the  asymptotic  distribution  of  F  that  one  form 

of  the  approximate  (1  —  q  )  lOO*^  confidence  interval  for  P  (  co)  is 


±2i_a  se(F(^c;4,))  . 


where  the  standard  error  of  F 


is  the  square  root  of  var  (^F 


with  ,8^  replacing  0^  in  this  variance  expression. 


While  this  approach  for  establishing  confidence  intervals  is  a  straightfor¬ 
ward  application  of  Wald  inference,  the  resulting  interval,  (LCL,  UCL),  may 
result  in  lower  confidence  limit  (LCL)  less  than  zero  or  an  upper  confidence 
limit  (UCL)  greater  than  one.  To  avoid  this  problem,  we  can  consider  a  sec¬ 
ond  approach  where  we  obtain  the  (1  —  o)  100%  confidence  interval  for  the 
linear  predictor  by  once  again  applying  the  delta  method.  It  can  be 

shown  that  the  asymptotic  variance  of  the  linear  predictor  is 


var 


=  var 


lin 


with  resulting  (1  -  a)  100%  confidence  interval  as 


iiA  ±  1  =  (LCL*.  UCL*). 

It  follows  that  an  approximate  (1  —  o)  100%  confidence  interval  for  F 
is  (F (LCL*),  F(UCL*)). 

The  general  procedure  is  to  obtain  the  fit  and  corresponding  (1  —  o)  100%. 
confidence  interval  (by  either  method)  for  a  fine  grid  of  values  of -oo  through¬ 
out  the  range  of  velocity  values.  Connecting  the  fits,  lower  confidence  lim¬ 
its,  and  upper  confidence  limits  with  straight  line  segments  results  in  three 
smooth  curves,  the  curve  fit  to  the  data  and  the  two  (1  -  a)  100%)  confi¬ 
dence  bands.  Both  types  of  interval  methods  are  illustrated  in  figures  4  and 
5.  The  method  based  on  the  linear  predictor  results  in  unsatisfactory  wide 
intervals  at  the  boundaries. 

Our  procedure  is  a  specific  example,  though  developed  independently,  of 
the  related  method,  local  polynomial  kernel  regression  of  generalized  lin¬ 
ear  models,  introduced  by  Fan,  Heckman,  and  Wand  (1995). 

Once  the  response  curve  and  the  confidence  bands  are  obtained,  the  F50 
as  well  as  confidence  intervals  for  the  V50  can  be  computed.  Because  it  is 
not  possible  to  obtain  a  closed  functional  form  for  the  estimate  of  P  (oo), 
the  V50  and  confidence  intervals  for  the  V50  are  obtained  through  the  “  in¬ 
verse”  process,  where  a  horizontal  line  is  extended  from  P{v)  —  0.5  to 
the  three  curves,  representing  the  upper  confidence  limit,  the  estimated  re¬ 
sponse  curve,  and  the  lower  confidence  limit.  At  the  intersection,  vertical 
lines  are  dropped  to  the  velocity  axis,  where  the  V50  and  the  95-percent 
confidence  interval  for  the  V50.  (U50/,.  Usof?).  can  be  obtained.  (We  note  that 
if  the  shatter  gap  effect  is  present,  there  will  be  two  l^o  values,  each  with 


Figure  4.  Local  logistic 
regression  fit  to 
macrocomposite  data 
with  95%  confidence 
bands. 
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Figure  5.  Local  logistic 
regression  fit  to 
macrocomposite  data 
using  linear  predictor 
for  confidence  interval. 
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a  corresponding  95-percent  confidence  interval.)  This  process  is  easily  ac¬ 
complished  numerically  wdth  a  computer.  The  IRLS  algorithm  to  compute 
the  llogr  fit  is  outlined  in  appendix  A.  A  SAS  macro  for  calculating  the  llogr 
fit  to  any  set  of  0  to  1  data  may  be  obtained  from  the  authors.  The  complete 
SAS  macro  is  listed  in  appendix  B. 


3.  Some  Simulation  Results 


In  this  section,  we  present  a  small  simulation  study,  generating  binary  data 
from  several  nonmonotonic  dose-response  curves.  Because  the  primary  in¬ 
terest  with  the  Army  penetration  data  is  estimation  of  the  V50  values,  we 
evaluate  the  llogr  procedure  by  comparing  the  estimated  V50  values  with 
the  true  V50  values.  The  true  cdf  used  in  this  evaluation  was  chosen  as  the 
cdf  mixture  model: 


P{v)  =  [1  -  Ft{v;  fj,T,<yT)]Fl{v;  +  Ft{v;  ^T,crT)Fs{v, 


The  values  of  ij,s  were  varied  while  aT,cri,  and  as  were  set 

equal  to  each  other  at  a  constant  value  of  0.05.  The  cdf  representing  Ft,  Fj, 
and  Fs  was  chosen  to  be  the  logistic  cdf.  For  example,  the  cdf  represent¬ 
ing  Ft  can  be  written  as  Fr  =  (1  -h  exp  (-  (a:  -  ht)  /aT))~^.  While  our  full 
simulation  study  considered  a  variety  of  values  of  iJiT,P-i<  and  jxs,  for  the 
sake  of  brevity  we  show  only  the  results  for  the  three  combinations  given 
in  table  1  and  used  to  generate  the  three  curves  displayed  in  figure  6.  As 
seen  in  figure  6,  each  curve  has  two  T50  values. 


Table  1.  Parameter 
values  for  simulated 

Curve 

M/ 

MS 

1 

0.4 

0.2 

0.9 

curves. 

2 

0.4 

0.2 

0.7 

3 

0.4 

0.2 

0.6 

Rescaled  velocity  (0  to  1 ) 
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For  the  simulations,  random  binary  responses  were  generated  for  100  evenly 
spaced  velocities  between  the  values  of  0  and  1.  As  can  be  seen  in  fig¬ 
ure  6,  the  velocity  values  have  been  rescaled  between  0  and  1  so  that  the 
value  of  the  bandwidth  would  be  more  meaningful.  The  cdf  mixture  model 
was  simulated  with  SAS  interactive  matrix  language  (IML)  for  each  of  the 
curves  in  table  1.  The  process  was  repeated  500  times,  and  the  average  %) 
values  were  computed  over  the  500  Monte  Carlo  repetitions.  We  view  these 
average  V50  values  as  assessing  the  ability  of  the  llogr  procedure  to  estimate 
the  true  V50  values.  We  used  the  PRESS*  procedure  here  to  obtain  the  band¬ 
width  for  the  llogr  procedure. 

Tables  2  and  3  contain  the  results  from  the  simulations  for  estimating  the 
V50  values.  As  can  be  seen  in  the  fifth  column  of  the  tables,  the  local  logistic 
regression  does  an  exceptional  job  in  estimating  the  lower  and  upper  Vsn 
values  when  they  are  calculable.  The  average  absolute  error  of  each  esti¬ 
mate  is  less  than  0.03.  The  quality  of  these  estimates  indicates  the  goodness 
of  fit  of  the  llogr  procedure  and  its  ability  to  estimate  the  Vsn  values.  The 
excellent  estimation  can  also  be  attributed  to  the  ability  of  the  PRESS*  pro¬ 
cedure  in  finding  an  appropriate  value  of  the  bandwidth  to  fit  this  family 
of  curves. 

Note  also  that  in  the  third  column  of  tables  2  and  3,  the  number  of  V50 
values  that  were  not  calculable  is  in  parentheses.  In  order  for  both  V50/, 
and  V50C/  to  be  estimated,  the  fitted  llogr  curve  must  cross  the  P{v)  =  0.5 
line  three  times,  as  illustrated  in  figure  1.  When  the  llogr  curve  crosses  the 
p(t))  =  0.5  curve  line  only  once,  then  only  one  of  the  V50  values  can  be  esti¬ 
mated:  V50Z.  if  the  crossing  occurs  at  lower  values  of  velocity,  and  1^50^  if  the 
crossing  occurs  at  upper  values  of  velocity  (with  the  other  V50  value  being 
incalculable).  Data  generated  from  a  curve  similar  to  curve  1  in  figure  6  will 
more  frequently  fail  to  result  in  a  both  V50/,  and  Kson  being  estimable.  Low 
penetration  probabilities  for  high  velocities  in  curve  inhibit  the  movement 
of  the  sequential  design  to  the  highest  velocities  where  a  third  crossing  of 
P(^v)  =  0.5  could  occur.  Note  that  31  percent  of  the  time  (156  samples  out 
of  500),  the  Vsoc;  could  not  be  calculated  for  curve  1:  clearly  an  undesirable 
situation.  One  way  to  resolve  this  problem  would  be  to  increase  the  sample 
size  at  larger  velocities. 


Table  2.  Lower  V50 
estimation  summary 
(incalculable  values  in 
parentheses). 


Table  3.  Upper  V50 
estimation  summary 
(incalculable  values  in 
parentheses). 


Curve 

True  V50L 

Estimate  K^ol 

Average  error 

Average  abs.  error 

1 

0.20190 

0.20827  (41) 

0.00637 

0.02590 

2 

0.20190 

0.20782  (38) 

0.00592 

0.02425 

3 

0.20190 

0.20901  (45) 

0.00711 

0.02363 

Curve 

True  V50L 

Estimate  Vr,oL 

Average  error 

Avgerage  abs.  error 

1 

0.89999 

0.88810  (156) 

-0.01190 

0.02504 

2 

0.69975 

0.69845  (0) 

-0.00130 

0.01834 

3 

0.59810 

0.58604  (22) 

-0.01205 

0.02672 
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4.  Discussion  and  Summary 


The  research  presented  here  has  shown  that  the  llogr  procedure  provides 
a  more  flexible  and  reasonable  approach  to  fitting  binary  data  from  a  non¬ 
monotonic  function  than  the  classic  logistic  regression  procedure,  cdf  mix¬ 
ture  models,  and  some  other  nonparametric  procedures  (kernel  and  local 
linear  regression).  Through  simulating  a  family  of  nonmonotone  functions, 
it  has  been  shown  that  the  llogr  procedure  provides  an  outstanding  es¬ 
timate  of  V50  values  with  very  small  average  and  absolute  errors,  when 
calculable. 
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Appendix  A.  Numerical  Considerations 


The  values  of  the  fits  are  obtained  by  using  the  IRLS  algorithm.  The  proce¬ 
dure  is  outlined  below. 

To  find  the  proper  bandwidth: 

For  each  fixed  value  of  bandwidth  b,  perform  the  following: 

1.  Compute  the  kernel  weights  for  each  value  of  velocity,  v  =  Vi,  for 
i  =  1, . . .  ,n,  as 


2.  At  each  v  =  Vi,i  =  1,. ..  ,n,  compute  initial  coefficients  via  local  lin¬ 
ear  regression  (Hr)  as 

x'Wiv, 

where  the  n  x  n  diagonal  matrix  Wi  has  elements  wjj  =  h^. 

3.  Compute  n  x  1  vector  y*  as  yf,  =  ^ and  update  the  elements 
oftl^as 


=  F  (t.'4)  (l-F  (v'4))  hfj,  for  i  =  1, . . .  ,  n. 


4.  Compute  the  updated  estimates  of  the  coefficients  as 

L=L+{^''^‘XyX'W.y-. 

5.  If  0^.  «  quit  and  let  $.  =  Otherwise,  replace  by  0^.  and 
return  to  step  3  above. 

6.  After  convergence,  compute  the  fit  at  v  =  Vi  as  P  (vp  =  F  (piP^ 

7.  After  the  fits  are  obtained  for  all  n  observations,  compute  the  PRESS*(6) 
statistic  as 

E  (vi-Pivpy 

PRESS*  (6)  =  — - ^ - pT-n - > 

where  P  (vj)_p  is  the  llogr  (local  logistic  regression)  fit  based  on  the 
current  bandwidth  b,  obtained  with  the  ith  data  point  {yt,  vp  removed 
from  the  data  set.  The  current  version  of  the  algorithm  obtains  P  {vi)_^ 
by  using  steps  1  through  6  above,  based  on  the  n  —  1  point  data  set. 
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8.  Determine  6*.  the  value  of  b  that  minimizes  PRESS*(/)).  This  can  be  ac¬ 
complished  either  through  a  search  routine  (the  algorithm  currently 
uses  a  binary  search  to  find  If)  or  by  finding  b*  through  a  candidate 
list  of  bandwidths. 

Upon  completion  of  steps  1  though  8,  the  value  of  b*  has  been  found  and 
the  n  fits  are  obtained  at  the  data  points  based  on  b*.  Often,  however,  the 
fitted  curve  is  desired  over  a  second  series  of  i’  values,  which  may  or  may 
not  include  the  original  data  points.  For  example,  the  predicted  probability 
of  response  may  be  desired  for  all  velocities  ranging  from  the  minimum 
observed  velocity  to  the  maximum  observed  velocity  at  every  10  ft/s.  In 
this  case,  steps  2  through  6  of  the  IRLS  algorithm  above  are  changed  only  by 
the  replacement  of  Vj  with  vq,  one  of  the  velocity  values  where  predictions 
are  desired. 


Appendix  B.  Program  Documentation  &  Comments 


Below  is  the  SAS  code,  preceded  by  a  few  notes,  to  fit  binary  data  via 
the  nonparametric  local  logistic  regression  procedure  (llogr)  presented  in 
this  report.  The  llogr  procedure  uses  PRESS*  as  the  method  for  bandwidth 
selection. 

Some  notes  about  the  program  are  presented  below. 

1.  Lines  6,  7,  and  10  read  the  data  from  a  file.  The  file  can  be  a  text  (.txt), 
print  (.prn),  or  flat  data  file.  The  program  is  set  up  to  read  a  file  con¬ 
taining  two  columns  of  data:  the  first  column  contains  the  velocity, 
and  the  second  column  is  the  response  (0  or  1). 

The  INFILE  statement  reads  the  data  set  from  its  stored  location,  de¬ 
noted  by  “fi  lename.filetype.” 

2.  Once  the  data  have  been  read  into  the  IML  procedure,  the  regressor 
(in  this  case,  the  velocity)  is  rescaled  such  that  the  values  are  between 
0  and  1.  The  program  then  finds  the  bandwidth  to  minimize  PRESS*. 
Once  the  bandwidth  has  been  found,  the  program  does  the  following: 

•  Obtains  the  llogr  fit  at  each  of  the  data  points  (lines  220-  248) 

•  Finds  the  lower  and  upper  V50  values  (lines  252-  452) 

•  Obtains  predictions  at  10-ft/s  intervals  (lines  461-  502) 

3.  The  program  outputs  the  following  (lines  518-  521): 

•  The  bandwidth  (BW). 

•  The  chi-square  statistic,  the  model  degrees  of  freedom,  and  the 
mean  squared  error,  denoted  by  CFIISQR,  MODELDF,  and  MSE, 
respectively. 

•  The  lower  V50  (F50Z,)  and  the  upper  V50  (V5017)  values. 

•  The  predictions  at  lO-ft/s  intervals:  Velocity  (VEL),  fitted  value 
(YHATO),  and  lower  and  upper  confidence  limits  for  the  fitted 
value  (LCLO  and  UCLO). 

4.  Once  the  output  is  obtained,  the  user  can  produce  a  plot  of  the  fitted 
curve  using  Excel  or  another  plotting  application.  ITowever,  the  SAS 
code  below  also  generates  a  plot  with  upper  and  lower  confidence 
bounds  in  Computer  Graphic  Metafile  (COM)  format.  The  code  is  on 
lines  538  to  552.  The  user  must  supply  the  filename  and  the  directory. 
DO  NOT  CHANGE  THE  EXTENSION.  Once  the  CGM  file  has  been 
created,  the  user  can  view  the  plot  in  MS  Word  or  another  application 
that  converts  CGM  files. 
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The  following  represents  the  directory,  filename,  and  filetype  of  the  data 
set.  The  data  set  should  be  in  text  form  or  a  flat  data  file. 

Options  ls=80  ps=64  nodate  nonunber; 

Titlel  ’Local  Logistic  Regression  Analysis’; 

Title2  ’using  Press*  to  Obtain  Banduidth’; 

DATA  HACRO; 

INFILL  "d:\research\ARHY\BACROCOHPOSITE  ORIGINAL  DATA.PRN"; 

Here  variables  are  being  read  from  the  data  set.  In  this  case,  they  are  veloc¬ 
ity  and  response. 

INPOT  VELOCITY  RESPONSE; 


DATA  LLOGR; 

SET  HACRO; 
RON; 

PROC  SORT; 

BY  VELOCITY; 
RON; 


The  following  sorts  the  data  set  by  velocity. 

PROC  IHL; 

Ose  LLOGR; 

Read  Ail  Into  X  Van  {VELOCITY}; 

Read  All  Into  PI  Var  {RESPONSE}; 

Now  the  velocity  is  being  converted/transformed  to  the  0  to  1  data  range. 
This  is  so  that  we  can  obtain  a  better  interpretation  of  the  bandwidth.  That 
is,  given  a  finite  range  for  the  bandwidth,  one  can  interpret  its  relative  size. 
However,  if  the  data  are  not  converted,  the  range  of  the  bandwidth  is  from 
0  to  infinity,  allowing  a  less  meaningful  interpretation. 

N=NR0W(X); 

HINX=HIN(X); 

HAXX=HAX(X); 

LENGTH=(HAXX-HINX)/10  M; 

VEL=J(LENGTH,1,0); 

V=0; 

DO  I=HINX  TO  HAXX  BY  10; 

V=VM; 

VEL[V,]=(I-HIHX)/(HAXX-HINX); 


X=a-Hin)/{HAXX-mM);  *  Transforms  X-variable  to  [0,1]  Range; 


m  niTIALIZATIOMS 

YHAT=J(1I,1,1); 

LCL=J(»,1,1); 

UCL=J(II,1.1); 

PRSS=J(3,1.0); 

HO=J(N,1,0); 

TYHATHI=J(N,1,1); 

♦♦♦  E»D  OF  IlflTIALIZATIOllS 

Here  begins  the  subroutine  to  obtain  the  true  PRESS  statistic  and  the  sub¬ 
sequent  bandwidth  found  to  minimize  this  PRESS  statistic. 

START  TRUEYHI; 

DO  1=1  TO  R; 

FREE  XHI; 

IF  1=1  THEN 
DO; 

XHI=X[2:II,]; 

PHI=PI[2:N,]; 

EHD; 

ELSE  IF  I=N  THEI 


XMM[l:li-l,]; 

PHI=PI[1:H,]; 

END; 

ELSE 

DO; 

XHI1=X[1;I-1,]; 

PHI1=PI[1:M,]; 

XHI2=X[W:I(,]; 

PHI2=PI[M:1I.]; 

X11I=XHIl//XtlI2; 

PMI=PHI1//PHI2; 

ERD; 

XHI=J(R-1,1,1)||XHI; 

X0=1||X[I,]; 

D0=X[I,]-Xn[,2]; 

DSQ0=D0H2; 

DINTO  =  (-l)ilDSq0/(BWI2); 
EXPDO  =  EXP(DIRTO); 


21 


ESUHO  =  SUIKEXPDO);  ♦  ROW  SUll  OF  IIUllERATOR  VALUES; 
KO  =  EXPDO/ESUflO; 


BHAT=INV(XHI'*DIAC(KO)+XllI)*XHr*DIAG(RO)*PllI; 

DO  ITER=1  TO  2; 
y=XIII*BHAT; 

P=l/(hEXP(-Y)); 

0=1-P; 

Z=EXP(-Y)/((1*EXP(-Y))f#2); 

W=DIAG(PJq)«IAG(KO); 

XPXI=INV(XHI'*W«XHI); 

BHAT=BHAT*XPXI^XHI'*W*((PHI-P)/Z); 

E(ID; 

TYHATHI[I,I]=l/(hEXP(-XO*BHAT)); 

END; 

FINISH; 

♦  Subroutine  to  begin  the  search  for  the 
bandwidth  that  minimizes  Press*; 

START  Bk'SEARCH; 

BWO  =  {0.1,  0.5,  0.7}; 

DO  A  =  1  TO  3; 

BW=BII0[A,]; 

D=X*J(1,N,1)-J(»,1,I)*X'; 

D=D‘; 

DS0=Df#2, 

DINT  =  (-l)#(DSI}/(BW»f2)#J(N,N,l)); 

EXPO  =  EXP(DINT); 

ESUIl  =  EXPDI,-*] ;  ♦  ROW  SOU  OE  NDHERATOR  VALOES; 

K  =  EXPD/(ES0H*J(1,N,1)); 

RUN  TROEYHT; 

PRSS[A,]  =  SI)H{(PI-TYHATHI)III2)/(N-TRACE(K)); 

END; 

♦  SECOND  LOOP  SEARCHES  FOR  THE  BANDWIDTH  TO  HINIHIZE  The  True  PRESS* 
ITERB=0; 

HINI=HIN(PRSS); 

HIHOLD=HAX(PRSS); 

DO  WHILE  (ABS(llIllI-HIN0LD)>lE-8); 

ITERB=ITERBM; 

HINOLD=HAX(PRSS); 

IF  Him  =  PRSSll,]  THEN 


SETl  =  BW0[1,]; 
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SET2  =  2»BH0[1,]-BW0[2J; 
SETS  =  BW0[l.]/2; 

VEC  =  SET3//SET2; 

BtfO[l.]  =  HAKVEC); 
Bt'0[3.]  =  BW0[2J; 

Btf0[2,]  =  SETl; 

EMD; 


IF  Hin  =  PRSS[2,]  THEN 


AKIN  =  2; 

HAXI  =  HAX(PRSS); 

IF  HAXI  =  PRSSllJ  THEN 
DO; 

AHAX  =  1; 

BWO[l,]  =  B«0[2.]; 

BWO[2j  =  (BW0[2,]  *  BH0[3,])/2; 
END; 

IF  HAXI  =  PR3S[3,]  THEN 
DO; 

AHAX  =  3; 

B«0[3J  =  BW0[2,]; 

BH0[2,]  =  (BW0[2,]  +  BW0[l,])/2; 
END; 

END; 

IF  HINI  =  PRSS[3J  THEN 
DO; 

AHIN  =  3; 

SETl  =  BW0[3,]; 

BHO[3,]  =  2(IBW0[3,]-BW0[2,]; 

BHO[l.]  =  BW0[2.]; 

BII0[2.]=SET1; 


*  OBTAIN  THE  LOCAL  LOGISTIC  FIT  FOR  THE  NEW  VALUE  OF  THE  BANDWIDTH*; 

BW  =  BWOUHIN.]; 

D=X*J(1,N,1)-J(N,1,1)*X‘; 

D=D‘; 

DSq=DM2; 

DINT  =  (-l)N(DSq/(BWI(#2)IJ(N,N,l)); 

EXPD  =  EXP(DINT); 
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ESUfl  -  EXPD[,^;  *  ROW  SUfi  OF  NUflERATOR  VALUES; 

K  =  EXPD/(ESUfiU(l,H,l))i 

RUfI  TRUEYHI; 

IIIHPRESS  =  SUH((PI-TYHAT11I)##2)/(N-TRACE(K)); 

*  VALUE  OF  PRESS  FOR  THE  CURREIIT  BAllDUIDTH  FOR  THIS  ITERATIOH^ 

*  RESET  POSITIOHS  OF  BAIiDWIDTH  AHD  PRSS  VECTORS  DEPEfIDIIiG  ON  THE  VALUE  OF 
PRESS  FOR  THE  CURRENT  VALUE  OF  THE  BAUDWIDTH^ 

IF  AHIfl  =  1  THEN 
DO; 

PRSS[3J  =  PRSS[2J; 

PRSS[2,]  =  PRSS[1J; 

PRSS[1J  =  HINPRESS; 

END; 

IF  AHIII  =  3  THEN 
DO; 

PRSS[1J  =  PRSS[2J; 

PRSS[2J  =  PRSS[3J; 

PRSS [3, ]  =  HINPRESS; 

END; 

IF  AHIN  =  2  THEN 
DO; 

IF  AHAX  =  1  THEN  PRSS[lJ  =  PRSS[2j; 

IF  AHAX  =  3  THEN  PRSS [3,]  =  PRSS [2  J; 

PRSS  [2 J  =  HINPRESS; 

END; 

HINI=HIN{PRSS); 

END;  SEARCH  FOR  OPTIHAL  BANDWIDTH  ENDS; 

FINISH; 


WGT=1; 

RUN  BWSEARCH; 

*  Having  obtained  the  bandwidth,  we  can  now  obtain  the  local  logistic  regression  fit*; 
X2=J(N,l,l)||X; 

H=J(N,1,1); 

Do  1=1  to  N; 

D=X[I,]-X; 

DS1)=D##2; 

Dint=(-l)#DSq/(BW#l2); 

Expd=Exp(Dint); 

Esiiiii=SUH(Expd); 
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R=Expd/Esuin; 
H[IJ  =K[I,]; 


BHAT=IIIV{X2‘*DIAG(K)*X2)*X2‘+DIAG(K)*PI; 

DO  ITER=1  TO  2; 

Y=X2*BHAT; 

P=1/(WXP(-Y)); 

q=l-P; 

Z=Pfq; 

W=DIAG(Pltq)ADIAG(K); 

XPXI=I»V(X2‘+W»X2); 

BHAT=bhat*XPXI*X2‘«*((PI-P)/Z); 

END; 

Y=X2*BHAT; 

P=l/(hEXP(-Y)); 

q=l-P; 

Z=EXP(-Y)/((hEXP(-Y))«l(2); 
W=DIAG(Piq)#DIAG(K);  • 

XPXI=INV(X2‘*M2); 

YHAT[I,]=1/(1+Exp(-X2[I,]*BHAT)); 

END; 

♦  ALGORITHM  FOR  COHPDTING  THE  LOWER  V50  VALUE*; 
COONT=0; 

X50=J(3,1,0); 

DO  1=1  TO  N; 

IF  C0DNT=O  THEN 
DO; 

IF  YHAT[I.]>0.5  THEN 
DO; 

X50[l,]=X[I,]-0.05; 

X50[2,]=X[I,]-0.025; 

X50[3,]=X[I,]; 

COUHT=l; 


END; 

SET50  =  J(3.1.0); 

♦  OBTAIN  PHAT  FOR  3  VALDES  OF  X  (DOSE)  *; 

DO  1=1  TO  3; 

D=X60[I,]-X; 

DSq=Dfl2; 

DINT=(-l)fDSq/(BWIf2); 


E)(PD=EXP(D]NT); 

ESUH=SOtl(EXPD); 

K=EXPD/ESOHi 

BHAT=INV(X2‘*DIAG(K)+X2)*X2'*DIAG(K)*PI; 
DO  ITER=1  TO  2; 
y=X2+BHAT; 

P=1/(1*EXP(-Y)); 

Q=l-P; 

Z=EXP(-Y)/((hEXP(-y))ll2)i 

W=DIAG(P«0)»DIAG(K); 

XPXI=I(IV(X2'*W*X2); 

BHAT=BHAT*XPXI*X2‘«*((PI-P)/Z); 

END; 

X0=1||X50[I,]; 

SET50[I,]=1/(1*EXP(-X0*BHAT)); 

END; 


PTEHP=0; 

B=0; 

DO  B=1  TO  200i*WHILE  (ABS(PTEHP-0,5)>IE-4); 
B=B*1; 

IF  SET50[2,]  >0.5  THEN 
DO; 

INTERVALS; 

SETl  =  X50[l.]; 

SET2  =  (X60[2,]*X50[l,])/2; 

SET3  =  X50[2,]; 

X50[l,]  =  SETl; 

X60[2,]  =  SET2; 

X50[3,]  =  SET3; 

END; 

IF  SET50[2,]  <0.5  THEN 
DO; 

INTERVAL=2; 

SETl  =  X50[2,]; 

SET2  =  (X50[3,]  *  X50[2,])/2; 

SET3  =  X50[3.]; 

X50[1.]=SET1; 

X50[2,]=SET2; 

X50[3,]=SET3, 

END; 
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♦  OBTAIN  YHAT  FOR  NEW  VELOCITY*; 


V50L  =  X50[2,]; 

D=V50L-Xi 

DSQ=Dff2; 

Dint=(-l)#DSq/(BH«2); 

Expd=Exp(Dint); 

Esuiii=SlJH(Expd); 

K=Expd/Esuni; 

BHAT=IiV(X2‘*DIAG(K)*X2)*X2‘*DIAG(K)*PI; 

DO  ITER=1  TO  2; 

Y=X2*BHAT; 

P=l/(hEXP(-Y)); 

q=l-P; 

Z=EXP(-Y)/((hEXP(-Y))#f2); 

W=DIAG(Piq)IDIAG(K); 

XPXI=INV(X2‘*W*X2); 

BHAT=BHAT+XPXI*X2‘*W*((PI-P)/Z) ; 

END; 

X0=1||V50L; 

PTERP=l/(hExp(-XO*BHAT)); 

IF  nTERVAL=l  THEN  SET50=SET50[1,]//PTEHP//SET50[2,]; 
IF  nTERVAL=2  THEN  SET50=SET60[2,]//PTEHP//SET50[3,]; 
END;  ♦  SEARCH  FOR  V50L  ENDS; 

IF  (ABS{PTEHP-0.5)>lE-2)  THEN  V50L=.; 

♦  ALGORITHM  FOR  COMPUTING  THE  UPPER  V50  VALUE  ♦; 

COONT=0; 

DO  M  TO  1  BY  (-1); 

IF  COUNTED  THEN 
DO; 

IF  YHAT[I,]<0.6  THEN 

DO; 

X50[1,]=X[I,]; 

X50[2,]=X[I,]*0.025; 

X50[3,]=X[I,]*0.05; 

CODNT=l; 

END; 


SET50  =  J(3,l,0); 
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*  OBTAIH  PHAT  FOR  3  VALUES  OF  X  (DOSEl  ♦; 


DO  1=1  TO  3; 

D=X50[I,]-X; 

DSI)=DfJ2; 

DI11T=(-1)#DS0/(BMI2); 

EXPD=EXP(DINT); 

ESDH=SDn(EXPD); 

K=EXPD/ESU11, 

BHAT=IUV(X2‘4DIAG(K)*X2)*X2‘*DIAC(K)*PIi 
DO  ITER=1  TO  2; 

Y=X2+BHAT; 

P=1/(1*EXP(-Y)); 

q=l-P; 

Z=EXP(-Y)/((hEXP(-Y))ll2); 

W=DIAG(PIQ)fDIAG(K); 

XPXI=mV(X2'«+X2); 

BHAT=BHAT*XPX1*X2‘*W*((PI-P)/Z); 

END; 

X0=1||X50[I,]; 

SET50[I,]=1/(KXP(-X0*BHAT)); 

EKD; 

PTEHP=0; 

B=0; 

DO  B=1  TO  200;  ♦WHILE  (ABS(PTEHP-0.5)>lE-4); 
B=B*1; 


IF  SET50[2J  >  0.5  THEN 
DO; 

INTERVAL=1; 

SETl  =  X50[l,]; 

SET2  =  (X50[2j*X50[l,])/2; 
SET3  =  X50[2,]; 

X50[l,]  =  SETl; 

X50[2,]  =  SET2; 

X60[3,]  =  SET3; 

END; 

IF  SET50[2.]  <0.5  THEN 
DO; 

INTERVAL=2; 

SETl  =  X50[2,]; 

SET2  =  (X60[3,]  +  X50[2,])/2; 
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SET3  =  X50[3,]; 

X50[1,]=SET1; 

X50[2,]=SET2; 

X50[3,]=SET3; 

END; 

♦  OBTAIN  THAT  FOR  NEW  VELOCITY*; 

V500  =  X50[2,]; 

D=V50D-X; 

DSI)=D##2; 

Dint=(-l)»DSq/(BMf#2); 

Expd=Exp(Dint) ; 

Esuin=SDH(Expd); 

K=Expd/Esuiii; 

BHAT=INV(X2'*DIAG(R)*X2)*X2‘*DIAG(K)*PI; 

DO  ITER=1  TO  2; 

Y=X2*BHAT; 

P=1/(1+EXP(-Y)); 

q=l-P; 

Z=EXP(-Y)/((1*EXP{-Y))«2); 

W=DIAG(P(iq)IDIAG(K); 

XPXI=INV(X2'*W*X2); 

BHAT=BHAT+XPXI +X2  ‘  *W*  ( (PI  -P )  /Z) ; 

END; 

X0=lllV50O; 

Y=X2*BHAT; 

P=l/(hEXP{-Y)); 

q=l-P; 

Z=0P(-Y)/((MXP{-Y))III2); 

W=DIAG(Pjq)IDIAG(K); 

XPXI=IBV(X2‘*W*X2); 

PTEBP=l/(l*Exp(-XO*BHAT)); 

IF  IITERVAL=1  THEN  SET50=SET50[1.]//PTEMP//SET60[2,] ; 
IF  INTERVAL=2  THEN  SET50=Sn50[2,]//PTEHP//SET50[3,] ; 
END;  *  SEARCH  FOR  V50D  ENDS; 

IF  {ABS{PTEHP-0.5)>lE-2)  THEN  V50H=.; 

HODELDF=SUH(H); 

DFE=N-HODELDF; 

CHISqR=SUH( ( (PI-YHAT)»I2) / (YHAT«(1-YHAT) ) ) ; 
HSE=CHISqR/DFE; 
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+♦♦  HAKE  PREDICTIOHS  AT  INCREHEHTS  OF  10  FROIl  Hill  VELOCITY  TO  HAX  VELOCITY 
YHAT0=J(LEIIGTH,1,0); 

LCL0=J{LEI1GTH,1,0); 

UCL0=J(LE(1GTH.1,0); 

DO  1=1  TO  LEHGTH, 

D=VEL[I,]-X; 

D=D'; 

DSH»«2; 

DIHT  =  (-l)fDSq/(B'Jif2), 

EXPO  =  EXP(DINT); 

ESDH  =  SUH(EXPD);  ♦  ROD  SOI!  OF  NUHERATOR  VALUES; 

K  =  EXPD/ESOH; 

BHAT=IP(X2'*DIAG(K)*X2)*X2'+DIAG(K)»PI; 

DO  iter  =  1  to  2, 

Y=X2*BHATi 

P=1/(1*EXP(-Y)); 

q=l-P; 

Z=Piqi 

W=DIAG(Ptq)#DIAG(K); 

XPXI=INV(X2‘«tX2); 

BHAT=BHAT*XPXI*X2‘*W*((PI-P)/Z); 

END; 

Y=X2tBHAT; 

P=1/(1*EXP(-Y)); 


Z=EXP(-Y)/({hEXP(-Y))«2); 

«=DIAG(Pfq)fDIAG(K), 

XPXI=m(X2'*W»X2); 

X0=1||VEL[I,]; 

YO=XO*BHATi 

YHATO[I.]=l/(hEXP(-YO)); 

PO=YHATO[I,]; 

V0=DIAG{K»l2)fDIAG(P#q); 
VBHATO=XPXI*X2‘*VO*X2+XPXI; 
Z0=EXP(-Y0)/((hEXP(-Y0))lf2); 
VYHATO=(ZOH2)IXO*VBHATO+XO' ; 

LCLO  [I , ] =YHATO [I , ] -1 . 96«SqRT(VYHAT0) ; 
UCLO  [I , ] =YHATO [I . ] *1 . 96*SqRT(VYHAT0) ; 
IF  LCL0[I,]<0  THEII  LCLO[I,]=0; 

IF  UCL0[I,]>1  THEII  UCL0[I,]=1; 

END; 
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VEL=VEL*(HAXX-MIIX)-HINX; 


C2={"X0''  "YHATO"  "LCLO"  "DCLO"}; 

PRED2=VEL||yHAT0||LCL0||UCL0; 

CREATE  YHAT2  FROM  PRED2[C0LNAHE=C2]; 

APPEND  FROM  PRED2; 

X=X*(HAXX-HINX)^I1INX; 

V60L=V50L*(HAXX-HINX)*HINX; 

V50U=V50D*(HAXX-HINX)+HI»X; 

Outputs  are  the  bandwidth  (BW),  Chi-square  statistic,  model  degrees  of 
freedom,  mean-squared  error,  V50  values,  and  the  data  (VEL,  YHATO,  LCLO, 
and  UCLO)  at  10-ft/s  intervals. 

Print  BW; 

Print  CHIS®  HODELDF  HSE  /; 

Print  V50L  V50U  /; 

PRINT  VEL  YHATO  LCLO  UCLO; 

The  following  creates  a  SAS  data  set  that  can  be  then  used  in  the  Proc  Glot 
procedure. 

Cl={"r  "PI"}; 

PRED1=X||PI; 

CREATE  YHATl  FROM  PRED1[C0LNAHE=C1] ; 

APPEND  FROM  PREDl; 

DATA  PHAT3; 

MERGE  YHATi  YHAT2; 

PROC  SORT; 

BYX; 

Below  is  the  code  to  create  a  CGM  format  graph.  This  format  gives  us  the 
capability  to  insert  the  plot  in  a  Word  file  or  any  other  application  that  reads 
the  CGM  file  type. 

FILENAME  GSASFILE  ’filename. cgm’; 

GOPTIONS  RESET=GOPTIONS  DEVICE=CGH  GACCESS=GSASFILE 
GPR0T0C0L=SASGPASC  GSFLEN=80; 

SYMBOLl  VALUE=D0T; 

SYHB0L2  INTERP0L=J0IN  LINE=1  VALUE=NONE; 

SYHB0L3  INTERP0L=J0IN  VALUE=NONE  LINE=4; 

SYHB0L4  INTERP0L=J0IN  VALUE=NONE  LINE=4; 

AXISl  LABEL=("VELOCITY,  fps"); 

AXIS2  0RDER={0  TO  i  BY  0.1) 

LABEL=(R0TATE=90  ANGLE-9D  "RESPONSE"); 

PROC  GPLOT  DATA=PHAT3; 
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PLOT  ?M  YHATO=^XO  UCLO+XO  LCLO+XO  /  OVERLAY  HAXIS=AXIS1  VAXIS-AXIS2 
TITLEl  J=C  H=2  local  Logistic  Regression  Analysis’; 

TITLE2  J=C  H=2  ’All  flacrocoiiiposite  Data’; 
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